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Abstract 

We estimate linear functionals in the classical deconvolution problem by kernel esti- 
mators. We obtain a uniform central limit theorem with y^-rate on the assumption that 
the smoothness of the functionals is larger than the ill-posedness of the problem, which 
is given by the polynomial decay rate of the characteristic function of the error. The limit 
distribution is a generalized Brownian bridge with a covariance structure that depends on 
the characteristic function of the error and on the functionals. The proposed estimators 
are optimal in the sense of semiparametric efficiency. The class of linear functionals is 
wide enough to incorporate the estimation of distribution functions. The proofs are based 
on smoothed empirical processes and mapping properties of the deconvolution operator. 
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1 Introduction 



Our observations are given by n £ N independent and identically distributed random variables 

Y i= x i+ e ii 3 = 1, (1) 

where Xj and are independent of each other, the distribution of the errors Ej is supposed 
to be known and the aim is statistical inference on the distribution of Xj. Let us denote the 
densities of Xj and Sj by fx and f e , respectively. We consider the case of ordinary smooth 
errors, which means that the characteristic function ip e of the errors £j decays with polynomial 
rate, determining the ill-posedness of the inverse problem. The contribution of this article 
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to the well studied problem of deconvolution is twofold. First, we prove a uniform central 
limit theorem for kernel estimators of the distribution function of Xj in the setting of \fn 
convergence rates. More precisely, the theorem does not only include the estimation of the 
distribution function but covers translation classes of linear functionals of the density fx 
whenever the ill-posedness is smaller than the smoothness of the functionals. Second, we 
obtain more exact results than the minimax rates of convergence by showing that the used 
estimators are optimal in the sense of semiparametric efficiency. 

The classical Donsker theorem plays a central role in statistics and states that the em- 
pirical distribution function of an independent, identically distributed sample converges uni- 
formly to the distribution function. In the deconvolution model (1) our Donsker theorem 
states uniform convergence for an asymptotically unbiased estimator of translated function- 
als 1 1 — >• I?* := f ((x — t)fx(x) dx, where the special case £ := l(_oo,o] leads to the estimation 
of the distribution function. This generalization allows to consider functionals #j as long 
as the smoothness of ( in an L 2 -Sobolev sense compensates the ill-posedness of the prob- 
lem. The limiting process G in the uniform central limit theorem is a generalized Brownian 
bridge, whose covariance depends on the functional ( and through the deconvolution operator 
J r ~ 1 [l/ip £ ] also on the distribution of the errors. The used kernel estimators $j are minimax 
optimal since they converge with a y^-rate. So investigating optimality further leads natu- 
rally to the question whether the asymptotic variance of the estimators is minimal, as in the 
case of the empirical distribution function in the classical Donsker theorem. We prove that 
the estimator is efficient in the sense of a Hajek-Le Cam convolution theorem. In partic- 
ular, the asymptotic covariance matrices of the finite dimensional distributions achieve the 
Cramer-Rao information bound. By uniform convergence and efficiency the kernel estimator 
of fx fulfills the 'plug-in' property of Bickel and Ritov [2] in the deconvolution model (1). 

The deconvolution problem has attracted much attention so we mention here only closely 
related works and refer the interested reader to the references therein. The classical works 
by Fan [11, 12] contain asymptotic normality of kernel density estimators as well as minimax 
convergence rates for estimating the density and the distribution function. Butucea and Comte 
[5] have treated the data-driven choice of the bandwidth for estimating functionals of fx but 
assumed some minimal smoothness and integr ability conditions on the functional which 
exclude, for example, £ := l(_ QOj o] since it is not integrable. Dattner et al. [6] have studied 
minimax-optimal and adaptive estimation of the distribution function. Asymptotic normality 
of estimators for the distribution function has been shown by van Es and Uh [31] in the case 
of supersmooth errors an by Hall and Lahiri [18] for ordinary smooth errors. In contrast we 
consider the estimation of general linear functionals and are interested in uniform convergence. 
Uniform results have been studied for the density but not for the distribution function by 
Bissantz et al. [3] and by Lounici and Nickl [21]. Recently, Nickl and ReiB [24] have proved a 
Donsker theorem for estimators of the distribution function of a Levy measure. Their situation 
is related but more involved than ours, owing to the nonlinearity and the auto-deconvolution 
of the Levy measure. In a deconvolution context we consider the more general problem of 
estimating linear functionals efficiently, which contains estimating of the distribution function 
as a special case and provides clear insight in the interplay between smoothness of ( and the 
ill-posedness of the problem. While efficiency has been investigated in various semiparametric 
models, e.g., see Bickel et al. [1], to the best of the authors knowledge there are no results in 
this direction in the deconvolution framework. However, in the Levy setting Nickl and Reifi 
[24] have shown heuristically that their estimator achieves the lower bound of the variance 
while a rigorous proof remained open. 
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In order to show the uniform central limit theorem in the deconvolution problem, we prove 
that the empirical process y/n(P n — P) is tight in the space of bounded functions acting on 
the class 

G := {J- ! [1M(-.)] * Ct\ t € R}, Ct := C(» - t), 

where P and P n = ^ J2]=i $Yj denote the true and the empirical probability measure of the 
observations Yj, respectively. Since Q may consist of translates of an unbounded function, this 
is in general not a Donsker class. Nevertheless, Radulovic and Wegkamp [26] have observed 
that a smoothed empirical processes might converge even when the unsmoothed process does 
not. Gine and Nickl [14] have further developed these ideas and have shown uniform central 
limit theorems for kernel density estimators. Nickl and ReiB [24] used smoothed empirical 
processes in the inverse problem of estimating the distribution function of Levy measures. In 
order to show semiparametric efficiency in the deconvolution problem, the main problem is 
to show that the efficient influence function is indeed an element of the tangent space. If the 
regularity of £ is small, the standard methods given in the monograph of Bickel et al. [1] do not 
apply in this ill-posed problem. Instead, we approximate £ by a sequence of smooth (£„) and 
show the convergence of the information bounds. Interestingly, this reveals a relation between 
the intrinsic metric of the limit G and the metric which is induced by the inverse Fisher 
information. Additionally to techniques of smoothed empirical processes and the calculus 
of information bounds, our proofs rely on the Fourier multiplier property of the underlying 
deconvolution operator J 7-1 [1/V £ ], which is related to pseudo-differential operators as noted in 
the Levy process setting by Nickl and ReiB [24] and in the deconvolution context by Schmidt- 
Hieber et al. [27]. Important for our proofs are the mapping properties of J 7 ~ 1 {l/(p £ ] on Besov 
spaces. 

This paper is organized as follows: In Section 2 we formulate the Donsker theorem and 
discuss its consequences. Efficiency is then considered in Section 3. All proofs are deferred to 
Sections 4 and 5. In the Appendix we summarize definitions and properties of the function 
spaces used in the paper. 



2 Uniform central limit theorem 
2.1 The estimator 

According to the observation scheme (1), Yj are distributed with density fy = fx * fe de- 
termining the probability measure P. The characteristic function ip of P can be estimated by 
its empirical version ip n (u) = \'YTj=\ e%uY ^ u ^ ^l- For C to be specified later and recalling 
Q = C(» — our a i m i s to estimate functionals of the form 

(Ct,fx)= J ( t (x)fx(x)dx. (2) 

Defining the Fourier transform by T f(u) := J e tux f(x) dx,u £ H, the natural estimator of 
the functional dt is given by 

d t := I (t(x) T- 1 \TK h ^\ [x] dx, (3) 

J L Ve J 

where K is a kernel, h > the bandwidth and we have written as usual Kh{x) = h~ 1 K(x/h). 
Choosing T ' K = l[_„- j7r ] for some ir > leads to the estimator proposed by Butucea and 
Comte [5]. Throughout, we suppose that 
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(i) K G L 1 (R) n L°°(R) is symmetric and band-limited with suppfT 7 ^) C [-1,1], 

(ii) for I = 1, . . . , L 

J K = 1, JjKVte-O, /|x^(x)|d,<oo and (4) 

(iii) K G C^R) satisfies, denoting (x) := (1 + X 2 ) 1 / 2 , 

|^(x)| + |^'(x)| <{x)-\ (5) 

Throughout, we write A p < £? p if there exists a constant C > independent of the parameter 
p such that A p ^ C B p . If A p < B p and i? p < A p , we write A p ^ B p . Examples of such kernels 
can be obtained by taking T K to be a symmetric function in C°°(]R) which is supported in 
[—1,1] and constant to one in a neighborhood of zero. The resulting kernels are called flat 
top kernels and were used in deconvolution problems, for example, by Bissantz et al. [3]. 

2.2 Statement of the theorem 

Given a function ( specified later, our aim is to show a Donsker theorem for the estimator over 
the class of translations Ct, t G R. In view of the classical Donsker theorem in a model without 
additive errors, where no assumptions on the smoothness of the distribution are needed, we 
want to assume as less smoothness of fx as possible still guaranteeing -y/n-rates. For some 
5 > the following assumptions on the density fx will be needed: 

Assumption 1. 

(i) Let fx be bounded and assume the moment condition f \x\ 2+s fx(x) dx < oo. 
(ii) Assume fx G H a (l£Cj that is the density has Sobolev smoothness of order a ^ 0. 

We refer to the appendix for an exact definition of the Sobolev space H a (lEC). Boundedness 
of the observation density fy follows immediately from (i) since ||/y||oo ^ ll/xllooll/ellL 1 < 00 • 
In addition to the smoothness of fx, the smoothness of C will be crucial. We assume for 

7s>7c > 

( G Z 7s ' 7c := jc =C C + C s C G # 7s (IR) is compactly supported as well 

as {x) T (C c (x) - a(x)) G H^(R) for some r > and (6) 
some a G C°°(]R) such that a is compactly supported j 

and write for £ G Z^ s ,7c with a given decomposition C = C s + C c 

HCIU— := llCII^+lli^iCWH^e, 

which is finite since 11^^(^)11//^ is bounded by |||£j^||frrc + \\ {ix+ l ){x y lie II <» r (C c (» - 
a(x))||ij7c < oo for any s > j c . Several examples for £ and corresponding 7 s ,7c will be given 
in Examples 1-3 below. In particular, l(_ 00) o] G Z 7s,7c for 7 S < 1/2. The ill-posedness of the 
problem is determined by the decay of the characteristic function of the errors. More precisely, 
we suppose 
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Assumption 2. Let the error distribution satisfy 

(i) f \x\ 2+6 f £ (x) dx < oo thus <p E is twice continuously differentiable and 

(ii) \(ip~ l )'(u)\ < (u) 13 ^ 1 for some (3 > 0, in particular I^J 1 ^)! < (n) /3 ,u G R. 

Throughout, we write <*p~ x = 1 /ip £ . The Assumption (ii) on the distribution of the errors is sim- 
ilar to the classical decay assumption by Fan [11] and it is fulfilled for many ordinary smooth 
error laws such as gamma or Laplace distributions as discussed below. Assumption 2(ii) im- 
plies that ip^ 1 is a Fourier multiplier on Besov spaces so that 

for p,q G [l,oo],s 6 R, is a continuous linear map, which is essential in our proofs, com- 
pare Lemma 5. In the same spirit Schmidt-Hieber et al. [27] discuss the behavior of the 
deconvolution operator as pseudo-differential operator. We define 

9t -J' 1 [if' 1 (-.)]* Ct and Q = {gt\t G R}. (7) 

Note that in general gt may only exist in a distributional sense, but on Assumption 2 and for 
( G Z 7s ' 7c it can be rigorously interpreted by (see (19)) 

g (x)=T- 1 [^- 1 (-u)TC s (u)](x) 

+ (i + i X ) J- 1 [p- 1 (-«) Hj^ay)] («)] (x) 

+ F-\^)'{-u)F[^(y)]{u)](x\ 

which indicates why we have imposed an assumption on (if^ 1 )' and have defined ||»||z7s,7c as 
above. 

It will turn out that Q is P-pregaussian, but not Donsker in general. Denoting by \_a\ the 
largest integer smaller or equal to a and defining convergence in law on ^°°(R) as Dudley [9, 
p. 94], we state our main result 

Theorem 1. Grant Assumptions 1 and 2 as well as £ G Z 7s,7c with ^ s > /3,7 C > (l/2Va) + 7 s 
and a + 3~/ s > 2/3 + 1. Furthermore, let the kernel K satisfy (4) with L = [a + 7 S J . Let 
h^ x+2la n — > and if j s ^ (3 + 1/2 Zet m addition h p n n — > oo /or some p > 4/3 — 47 s + 2, t/ien 

Vn(d t - $ t )ten G zn £°°(R) 

as n — >■ oo, where G is a centered Gaussian Borel random variable in ^°°(R) iwi/t covariance 
function given by 

Z,,t ■= J g s (x)g t (x)F(6x) - # s #t 
for g s ,gt defined in (7) and s,t£R. 

We illustrate the range of this theorem by the following examples. 

Example 1. We consider the indicator function l(_ oo ](x), x G R. Let a be a monotone 
decreasing C°°(R) function, which is for some M > equal to zero for all x ^ M and 
equal to one for all x ^ —M. We define ( s := l(_ 00j o] — a and Q c := a. From the bounded 
variation of £ s follows ( s G ^^(R) C i/ 7s (R) for any 7 S < 1/2 by Besov smoothness of 
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bounded variation functions (51) as well as by the Besov space embeddings (46) and (47). 
Since a G C°°(R) and a' is compactly supported, the condition on ( c is satisfied for any 
7 C > 0. Hence, l(_oo,t] £ Z 7s ' 7c if 7 S < 1/2. On the other hand, this cannot hold for 7 S > 1/2 
since i? 7s (]R,) C C°(1R) by Sobolev's embedding theorem or by (45), (46) and (47). Owing to 
the condition 7 S > ft, Assumption 2 needs to be fulfilled for some ft < 1/2 which is done, for 
example, by the gamma distribution T{ft, rj) with ft G (0, 1/2) and r\ G (0, oo), that is 

/ e (x) := 7/3,^) := f^y^ xf3 ~ le ~ X/Vl lo,oo)(x), 

and 9? e (u) = (1 — ir/u)" 13 , u € R. 

Example 2. Let Ct(x) := Ct 0*0 := max(K - |x - t|,0) and Q{x) := with K > 0. The payoff 
of the butterfly spread is described by such a function [13]. Then T C( u ) = 4 sin 2 (u/2) /u 2 and 
£ s G -?/ 7s (R) for any 7 S < 3/2. So, Assumption 2 is required for some ft < 3/2, which holds, 
for example, for the chi-squared distribution with one or two degrees of freedom or for the 
exponential distribution. 

Example 3. Butucea and Comte [5] studied the case ft > 1 and derived y^n-rates for 7 S > ft in 
our notation. In particular, they considered supersmooth (, that is J 7 ( decays exponentially. 
In this case £ G .fP(R) for any s G N. Requiring the slightly stronger assumption that 
(x) T ( > {x) G -fP(R) for some arbitrary small r > and for all s G N we can choose Q c := ( and 
£ s := 0. Then ft can be taken arbitrary large such that all gamma distributions, the Laplace 
distributions and convolutions of them can be chosen as error distributions. 



2.3 Discussion 

To have -^/n-rates we suppose "y s > ft in Theorem 1, which means that the smoothness of the 
functionals compensates the ill-posedness of the problem. This condition is natural in view 
of the abstract analysis in terms of Hilbert scales by Goldenshluger and Pereverzev [17], who 
obtain the minimax rate n _( - a+7s ^ ( ' 2a!+2 ' 9 ) V n -1 / 2 in our notation. As a consequence of the 
condition on 7 S and 7 C we can bound the stochastic error term of the estimator $t uniformly 
in h G (0, 1). The bias term is of order h a+ " /s . 

For 7 S > ft + 1/2 the class Q is a Donsker class. In this case the only condition on the 
bandwidth is that the bias tends faster than n -1 / 2 to zero. In the interesting but involved 
case 7 S G {ft, ft + 1/2], the class Q will in general not be a Donsker class. Estimating the 
distribution function as in Example 1 belongs to this case. In order to see that Q is in general 
not a Donsker class, let the error distribution be given by f e = 7 / g j7? (— •) and ( = 7^ with 
cr G (7s + l/2,/3 + l). Then g t equals r y t r-p jr) *8t. For the shape parameter holds a — ft G (1/2, 1) 
and thus gt is an L 2 (R)-function unbounded at t. The Lebesgue density of P is bounded by 
Assumption l(i). Hence, Q consists of all translates of an unbounded function and thus cannot 
be Donsker, cf. Theorem 7 by Nickl [22]. 

Therefore, for 7 S G {ft, ft + 1/2] smoothed empirical processes are necessary, especially 
we need to ensure enough smoothing to be able to obtain a uniform central limit theorem. 
The bandwidth cannot tend too fast to zero, more precisely we require h p n n — > 00 as n — > 00 
for some p with p > 4/3 — 47 s + 2. In combination with the bias condition fh^ +2,ya n — > 
as n — > 00 we obtain necessarily a + 7 S > 2ft — 2~/ s + 1 leading to the assumption in 
the theorem. Since 2a + 27s > a + 2ft — 7 S + 1 > 4/3 — 47 s + 2 we can always choose 
h n ~ n - l /( a + 2 l 3 -'ys+i) _ j n contrast to Butucea and Comte [5], Dattner et al. [6], Fan [12] our 
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choice of the bandwidth h n is not determined by the bias-variance trade-off, but rather by 
the amount of smoothing necessary to obtain a uniform central limit theorem. The classical 
bandwidth h n ~ ji- 1 /^^ 2 ! 3 ) [ s optimal for estimating the density in the sense that it achieves 
the minimax rate with respect to the mean integrated squared error (MISE), compare Fan [12] 
who assumes Holder smoothness of fx instead of L 2 -Sobolev smoothness. For this choice the 
bias condition h^" +2js n — > is satisfied. If 7 S ^ /3 + 1/2 the classical bandwidth satisfies the 
additional minimal smoothness condition in the case of estimating the distribution function 
with mild conditions on fx- It suffices for example that fx is of bounded variation. Then a 
and 7 S can be chosen large enough in (0, 1/2) such that 2a+2/3 > 4/3 — 47 s + 2 and the classical 
bandwidth satisfies the conditions of the theorem. Whenever the classical bandwidth h n ~ 
n - 1 /( 2a + 2 / 9 ) satisfies the conditions of Theorem 1, then the corresponding density estimator is 
a 'plug-in' estimator in the sense of Bickel and Ritov [2] meaning that the density is estimated 
rate optimal for the MISE, the functionals are estimated efficiently (see Section 3) and the 
estimators of the functionals converge uniformly over tGR. 

The smoothness condition on the density fx is then a consequence of the given choice of h n 
together with the classical bias estimate for kernel estimators. As we have seen in Example 1 
for estimating the distribution function we have ( = l(_ 00j o] G Z la,lc with j s < 1/2 arbitrary 
close to 1/2. In the classical Donsker theorem which corresponds to the case /3 — > the 
condition a + 37 s > 2/3 + 1 would simplify to a > —1/2. However, we suppose fx to be 
bounded, which leads to much clearer proofs, and thus fx G H°(R) is automatically satisfied. 
Assumption 1 allows to focus on the interplay between the functional £ and the deconvolution 
operator F~ 1 [<p £ 1 ]. Nickl and Reifi [24] have studied the case of unbounded densities, which 
is necessary in the Levy process setup, but considered Q = l(-oo,t] only. The class /Z 7s ' 7c 
is defined by L 2 -Sobolev conditions so that bounded variation arguments for ( have to be 
avoided in the proofs. 

An interesting aspect is the following: If we restrict the uniform convergence to (Ct)teT 
for some compact set T C R, it is sufficient to assume j^+±( c G -fP^R) instead of requiring 
(1 V \x\ T )(( c (x) - a(x)) G F 7c (R) for some r > and a function a G C°°(R) such that a' is 
compactly supported as done in Z 7s,7c . In particular, slowly growing £ would be allowed. The 
stronger condition in the definition of Z 7s ' 7c is only needed to ensure polynomial covering 
numbers of {gt\t G T} for T C R unbounded (cf. Theorem 7 below). 

As a corollary of Theorem 1 we can weaken Assumption 2(ii). If the characteristic function 
of the errors e is given by <p e = ip £ ip where ip £ satisfies Assumption 2(ii) and there is a Schwartz 
distribution v G ^*"(R) such that Tv = ^ l and v * ( G Z 7s ' 7c for ( G Z 7s ' 7c , then for t G R 

F- 1 ^ 1 ] * C(. -t) = F- 1 ^ 1 ] * (u * C)(. - t) 

and thus we can proceed as before. For instance, for translated errors f £ * 5^ with (i ^ 0, the 
distribution v would be given by <5_^. 

As for the classical Donsker theorem the Donsker theorem for deconvolution estimators has 
many different applications, the most obvious being the construction of confidence bands. Fur- 
ther Donsker theorems may be obtained by applying the functional delta method to Hadamard 
differentiable maps. Let us illustrate the construction of confidence bands. By the continuous 
mapping theorem we infer 

sup ■s/nl'&t — "&t\ — >sup|G(t)|. 
ten ' ten 
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The construction of confidence bands reduces now to knowledge about the distribution of the 
supremum of G. Suprema of Gaussian processes are well studied and information about their 
distribution can be either obtained from theoretical considerations as in van der Vaart and 
Wellner [30, App. A. 2] or from Monte Carlo simulations. Let qi- a be the (1 — a)-quantile of 
sup teIR |G(t)| that is P(sup teR |G(t)| < <?i_ Q ) = 1 - a. Then 

lim P (d t G \d t - gi_ a n~ 1/2 , d t + qi- a n~ 1,2 \ for alU G b) = 1 - a 

n->oo V ' / 

and thus the intervals [i>t — Qi-an^ 1 ^ 2 , $t + Qi-an^ 1 ^ 2 ] define a confidence band. 



3 Efficiency 

Having established the asymptotic normality of our estimator, the natural question is whether 
it is optimal in the sense of the convolution Theorem 5.2.1 by Bickel et al. [1]. Typically, effi- 
ciency is investigated for estimators T n which are (locally) regular, that is for any parametric 
submodel r] — > fx,r) an d n l / 2 \rj n — r]\ < 1 the law of n 1 / 2 (T„ — (£, fx,r))) under rj n converges for 
n — > oo to a distribution independent of (r/ n ). In Lemma 9 we show that the estimator from 
(3) is asymptotically linear with influence function x >->■ J J r_1 [^ 1 (— •)] * ((y)(S x — P)(dy) 
and thus -d t is Gaussian regular. 

In general, semiparametric lower bounds are constructed as the supremum of the infor- 
mation bounds over all regular parametric submodels. As it turns out, it suffices to apply the 
Cramer-Rao bound to the least favorable one-dimensional submodel ¥ g of the form 

frtg = fx& * fe with f X £ g := fx + £g, for all £ G (-r, r), 
with some r > and a perturbation g satisfying 

fx ± rg > and J g = 0. (8) 

Note that all laws P g are absolutely continuous with respect to P assuming supp(/x) = K,- 
Moreover, the submodels are regular with score function g*f £ / fy, since for all £ G (— r, r)\{0} 
we have the L 2 -differentiability 



f ( fY£g-fY-i9*fs \ 2 f _ 

J y u~y ) /y -°- 



— 1/2 

Similarly to van der Vaart [29, Chap. 25.5], we define the score operator Sg := (g * f £ )f Y 
and thus the information operator of fx is given by I := S*S, where S* denotes the adjoint 
of the linear operator S. This yields the Fisher information in direction g 



{lg,g) = {Sg,Sg) = j ' f 



Y (9) 



and we obtain the information bound 



^=TM' (10) 
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where the supremum is taken over all g satisfying (8). In the notation of [1, Def. 3.3.2], we 
consider the tangent space Q := {(g * f e ) / fy\g satisfies (8)}, representing the submodel 
and the efficient influence function of the parameter : Q — >■ H, h i-» (h, Q needs to be 
determined. 

Since we perturb the density additively with the restriction (8), the quotient \g/fx\ needs 
to be bounded and thus it is natural to assume a lower bound for the decay behavior of fx- 
We state with some 5 > and M G N 

Assumption 3. Let the following be satisfied 

(i) fx is bounded and fulfills the moment condition J \x\ 2+s fx (x) dx < oo, 
(H) fx € Wi(R) that is fx has L l -Sobolev regularity two, 

(Hi) fx{x) > (x)~ M for i6R. 

A precise definition of the L 1 -Sobolev space M^(H) can be found in the appendix. Due 
to the Sobolev embedding W?(R) C H a (R) with a < 3/2 (cf. (44) and (46)), Assumption 3 
implies the Assumption 1 in the previous section. The conditions on e need to be strengthened, 
too. 

Assumption 4. We suppose 
(i) j\x\ 2+5 f £ {x)dx<^, 

(ii) for some f3 G (0, oo) \ % and M from above let (p £ G C ( -^J vM )+ 1 (]R) satisfy for all 
k = 0,...,([P\ VM) + 1 

t {k= o } (u)-e- k <\4 k) (u)\<(u)-P- k . 

Since M + 1 ^ 2, easy calculus shows that Assumption 2(ii) on (p^ 1 follows from As- 
sumption 4 on tp £ . We supposed (3 ^ 7L mainly to simplify our proofs. Let us first show an 
information bound for smooth (. 

Theorem 2. Grant Assumptions 3 and 4 and let ( G =5^(1R) be a Schwartz function. For any 
regular estimator T of 'do = (C, fx) with asymptotic variance a 2 we obtain 

J (•F- 1 [<^ 1 (-«)]*C) 2 /y-< (11) 

In particular, the supremum in (10) is attained at g* := g*{Q) := I -1 C — (C> fx) fx, where the 
inverses of S* and I are given by 

(^)- 1 C=(-F- 1 [^ 1 (-.)]*C)v / A ; and 
Therefore, the score function corresponding to <?*(C) which is given by 

(compare (37) below) is the efficient influence function and, moreover, equals the influence 
function of This equality shows that the estimator is efficient for smooth functionals 
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Moreover, we found already the efficient influence function in the larger tangent set of all 
regular submodels. 

Unfortunately, less smooth £ might be only in the domain of (S*)~~ l while I -1 £ is not 
in L 2 (R) and thus the formal maximizer g*(() cannot be applied rigorously as the following 
example shows. 

Example 4. Let £j be gamma distributed with density 7,31 for (3 £ (1/4,1/2) and consider 
((x) = e rE l(_ 00j o](^) = 71,1 (— x) which is contained in Z 7s ' 7c for all 7,, < 1/2 and j c arbitrary 
large. We obtain 

(^)- 1 C = 7i-/3,i(-.)/A ; and r 1 ( = T- 1 [(l-iu)P((l + iu)- 1+ P*ip)]. 

While first term behaves nicely the Fourier transform of I" 1 ( is of order | -u| 1+2/3 > | M |-V 2 
for |u| — > 00 and thus I" 1 ( $ L 2 (R). 

Therefore, we choose an approximating sequence Q n — > C with (Cn)neiN ^ ^(R). For 
n S N let g* n := g*(( n ) = I -1 ( n — ((, fx) fx be the least favorable direction in the estimation 
problem with respect to (/x, Cn)- We obtain for every n 6 N 

^ ^ (9n,0 2 _ {(9*n,C-Cn) + (g*,Cn)) 2 

(Sg*, Sg* t ) 

This inequality suggests two possibilities to understand our strategy for obtaining the effi- 
ciency bound. First, the sequence (g* n ) approximates the formal maximizer g*(() and thus 
plugging g* n into the bound (10) might converge to the supremum. Second, any unbiased esti- 
mator of = (fx, Cn) is at the same time a possibly biased estimator of with bias tending 
to zero. Therefore, the bound for the smooth problems should converge to the nonsmooth one. 
The following lemma provides a sufficient condition for the convergence of the Cramer-Rao 
bounds. 

Lemma 3. Let Q and (Cn) satisfy {S*)~ l C, G L 2 (R) and Cn,I _1 Cn G L 2 (R) for all n G N. 
Then $ u -> C and ( jg;g^ -> ((S*)'^, (S*)"^) - <C, fx) 2 hold as n -> 00 if 

IK^r^Cn-OII^-^O, o*n->oo. 

Using mapping properties on Besov spaces, we will show that the underlying Fourier 
multiplier J 7-1 ^^ 1 ] and thus the inverse adjoint score operator (S*)^ 1 are well-defined on 
the set Z 7s > 7c . This allows the extension of Theorem 2 to all ( <E Z 7s > 7c with 7 S > (3 and 
7c > + 1/2. 

Since fit does not only estimate "&t pointwise but also as a process in l°°(JR), we want to 
generalize Theorem 2 in this direction, too. In view of Theorem 25.48 of van der Vaart [29] 
the remaining ingredient is the tightness of the limiting object, which is already a necessary 
condition for the Donsker theorem. A regular estimator T n of ($t)teTR. m ^°°(]R) is efficient if 
the limiting distribution of y / n(T„ — 1?) is a tight zero mean Gaussian process whose covariance 
structure is given by the information bound for the finite dimensional distributions (cf. the 
convolution Theorem 5.2.1 of [1]). Interestingly, the class of efficient influence functions for 
t 6 R is not Donsker as discussed above and thus there exists no efficient estimator which is 
asymptotically linear in £°°(R) [cf. 20, Thm. 18.8]. 

Theorem 4. Let Assumptions 3 and 4 be satisfied as well as ( £ Z ls,lc with j s > (3 and 
7c > (3 + 1/2. Then the estimator ("&t)tEH defined in (3) is (uniformly) efficient. 



(Sg* Sg*) 
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Additionally, the proof of Theorem 4 reveals the relation between the intrinsic metric 
d(s,t) 2 = E[(G S — Gj) 2 ] of the limit G, which is essential to show tightness, and the met- 
ric dj-i(s,t) 2 = — Cs)> (S*) ^ 1 (Ct — Cs)) which is induced by the inverse Fisher 
information, namely 

d7-i( S ,t) 2 =d( S ,t) 2 + (C t -C S ,/x) 2 

(cf. equations (25) and (43) below) such that both metrics are equal up to some centering 
term which is another way of interpreting the efficiency of 

4 Proof of the Donsker theorem 

First, we provide an auxiliary lemma, which describes the properties of the deconvolution 
operator .F -1 ^" 1 ]. 

Lemma 5. Grant Assumption 2. 

(i) For all s G M,,p,q G [l,oo] the deconvolution operator J 7_1 [^ 1 (— •)] is a Fourier mul- 
tiplier from Bp q (R) to Bpfl(fil), that is the linear map 

B;jR) -> B s p7 /(R),f * F-^-\-.)Ff] 

is bounded. 

(ii) For any integer m strictly larger then (3 we have J r_1 [(l + iu)~ m ip~ 1 ] G L 1 (R) and if 
m > (3 + 1/2 we also have T' 1 ^! + iu)-™^- 1 ] G L 2 (R). 

(Hi) Let 0+ > $ and f,g€ H^ + (R). Then 

J (T- 1 [^ 1 ]*f)g = J (r- 1 ^ 1 (-)]* 9)f- (12) 

Using the kernel K , this equality extends to functions g G L 2 (R) U L°°(1R) and finite 
Borel measures fj,: 

J {T- 1 [^ 1 TK h ]*v)g = j {F- l ^-\-.)FK h ]*g)d^. (13) 

Proof. 

(i) Analogously to [24], we deduce from Corollary 4.11 of [16] that (1 + iu)~^ipj l {— u) is a 
Fourier multiplier on B p q by Assumption 2(h). It remains to note that j : Bp q (JR) — > 

Bp~/(R), f i-> T' 1 ^ + iuf T f] is a linear isomorphism [28, Thm. 2.3.8]. 

(ii) Since the gamma density 71,1 is of bounded variation, it is contained in -Bj^IR) by 
(51). Using the isomorphism j from (i), we deduce 7 m> i G Bf^lR) and thus by Besov 
embeddings (47) and (44) 

T-'Kl + ^-"V- 1 ] G B^J(H) C S°i(]R) C L X (R). 
If m - (3 > 1/2 we can apply the embedding B™~ P (R) C B^~ P ~ 1/2 (R) C L 2 (R). 
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(iii) For / G (R) (i) and the Besov embeddings (44), (46) and (47) yield 

H-F-Vr 1 ] *f\\„ < || J^ 1 ^ 1 ] £ ll/ll^ £ H/IU+ < oo- 

Therefore, it follows by Plancherel's equality 

1 (j- 1 [^ 1 ]*/)(x)<7(x)dx = i- y ^-\-u)Tf{- U )Tg{u)du 

= J {^- 1 W- 1 {--)]*g)(x)f{x)dx. 

To prove the second part of the claim for g G L 2 (R), we note that by Young's inequality 

II^M^^^IIIl 2 < \\J r ~ 1 [v>7 ll [-yh,i/h]]\\L'\\Kh\\L^ < 00 

due to the support of T K and Assumption (5) on the decay of K. Since fx is a finite 
measure and 5 is bounded, Fubini's theorem yields then 

J g(x){T- 1 [ ( p- 1 TK h ]* f x)(x)dx 

= J j g{x)T- 1 Yp- 1 TK h ]{x-y) l x{dy)dx 

= j {F- l [<p-\-.)F K h ]* g){y)ii{dy), 

where we have used the symmetry of the kernel. In order to apply Fubini's theorem for 
the case g G L°°(1R), too, we have to show that || T~ x \y~ l F Kh] \\ L i is finite. We replace 
the indicator function by a function \ £ C°° (R*) which equals one on [—l/h,l/h] and 
has got compact support. We estimate 

WT-^fk^Wv ^ II^M^xIIIliII^IUi- (14) 

Using p~ 1 x is twice continuously differentiable and has got compact support we obtain 

IKi + x^-V^OzOlU < n^Kid-D 2 )^-^]^)^ 

< IKld-D^^xlUi <°°, 

where we denote the identity and the differential operator by Id and D, respectively. 
This shows that (14) is finite. □ 

4.1 Convergence of the finite dimensional distributions 

As usual, we decompose the error into a stochastic error term and a bias term: 

d t -0 t = d t -E[d t ]+E[d t ]-# t 

= J (fix)?' 1 [TK h ^^-] (x)dx + J ( t (x)(K h *f x (x)-f x (x))dx. 
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4.1.1 The bias 

The bias term can be estimated by the standard kernel estimator argument. Let us consider 
the singular and the continuous part of ( separately. Applying PlancherePs identity and 
Holder's inequality, we obtain 

J \( t s (x)(K h *f x (x)-f x (x))\dx 

< \\{u)-^\FK{hu) - l)||oo j {u) a+ls \F Q s {u) F fx{u)\du 

^ h a+ ^\\u-^\FK{u) - l)||oo||C s ||^ \\fx\\H« 

The term \\u~^ a+,ys \F K (u) — l)||oo is finite using the a Taylor expansion of F K around 
with (F K)^ = for I = 1, . . . , [a + j s \ by the order of the kernel (4). 
For the smooth part of Q PlancherePs identity yields 



/ 



\&(x)(K h *f x -f x )(x)\dx 
= ^J l^[^C^)](Id + D){(J-^(M-l)^/x(-^)}|dn 
< / \nT^iCt(x)]{FK(hu)-l + hF[ixK)(hu))F f x {-u)\du 
F^Qix^F K{hu) - l)F[ixf x ](-u)\du. 



The first term can be estimated as before and for the second term we note that xfxix) G 
L 2 (R) = //°(1R) by Assumption l(i) such that the additional smoothness of j^fiC c ( x ) yields 
the right order. Therefore, we have | E[i?t] — "&t\ IS h a+ " /s and thus by the choice of h, the bias 
term is of order o(n~ 1 / 2 ). 

4.1.2 The stochastic error 

We notice that ||£ c — a||#7c < |K^)~ T ||c s ||(^)' r (C c ( a; ) ~~ o,(x))\\hic < oo for any s > 7 C , where 
we used the pointwise multiplier property (48) as well as the Besov embeddings (47) and (45). 
We have C s G L 2 and by (44), (46) and (47) 

HCloo < Halloo + ||C C - alloc < ||a||oo + ||C C - a|| frr<= < oo, 

since 7 C > 1/2. Consequently we can apply the smoothed adjoint equality (13) and obtain for 
the stochastic error term 

/ (t(x)F- 1 \FK h ^^} (x)dx 
J L ip e J 

F- l [v- l {-.)FK h ]*t t (x)^ n -F){dx). (15) 



Therefore, it suffices for the convergence of the finite dimensional distributions to bound the 
term 

sup flF-^i-^FK^^a^Hdx), (16) 
he(o,i) J 
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for any function £ £ Z 7s ' 7c . Then the stochastic error term converges in distribution to a 
normal random variable by the central limit theorem under the Lyapunov condition [i.e., 19, 
Thm. 15.43 together with Lem. 15.41]. Finally, the Cramer- Wold device yields the convergence 
of the finite dimensional distributions in Theorem 1. 

First, note that the moment conditions in Assumptions 1 and 2 and the estimate 

\xff Y (x)^ J \x-y + y\Pf x {x-y)f e (y)dy<(\y\ p fx)*f e + fx*(\y?'fe), 
for x € R, p ^ 1, yield finite (2 + <5)th moments for P since 

\x\ 2+s f Y (x)dx < ||M 2+<5 /x||li||/ £ |Ili + II/x||li|IM 2+< 7 £ ||li < oo. (17) 



/ 



To estimate (16), we rewrite 

J- 1 [<p 7 i (-.)] * = t- 1 [^H-n) (Id + D) Hi^iC(y)] (u)] (x) 

= T~' [^\- u )^[i^iC c (y)}(n)](x) 
+ F- 1 l^\-u){F[^C{y)])'{u)\{x) (18) 

= (i + ix) T' 1 Vp-\-n) n^C(y)} («)] (x) 

+ T- 1 [( V - 1 y(-n)T[^ I C c (y)](u)}(x), 

owing to the product rule for differentiation. Hence, 

•F-Vr'M] *C0c) =T- 1 [^ 1 (-u)TC(u)](x) 

+ (l + tx )F- 1 W-\-u)F[^ ri C{y)](u)]{x) 

+ T- 1 [( V - 1 )\-n)T[ 1 ^ 1 C( y )](u)](x). (19) 

While T' 1 ^ 1 (-•)}*( may exist only in distributional sense in general, it is defined rigorously 
through the right-hand side of the above display for £ 6 Z 7s,7c . Considering ( * Kh instead of 
C, we estimate separately all three terms in the following. 

The continuity and linearity of the Fourier multiplier J 7 ~ 1 [(p~ 1 (—•)], which was shown in 
Lemma 5(i), yield for the first term in (19) 

|| F-\v-\-u)FC(u)FK h (u)\\\ HS = || J- 1 l<p-\-.)F[C*K h ]]\\ Bi2 

<\\( s *K h \\ B ^<\\( s \\ H , + s, 

where the last inequality holds by || J 7 Kh\\ oo ^ II-^IIl 1 - Using the boundedness of fy and the 
continuous Sobolev embedding H S / 4 (R) C L 2+5 (R) by (44), (47) and (46), we obtain 

\\T-\v7\-u)TC{u)TK h {u)]\\ L * +S{w) 

< || T-^i-u)? C, s {u)T K h {u)]\\ H s 

< ||C1^+* (20) 
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To estimate the second term in (19), we use the Cauchy-Schwarz inequality and Assump- 
tion 2(h): 

< || <„>- V2-/J- 5^1 { - u) \\ L2 1| (u> i/2+/»-rf jr {j J_ axmL2 

< \\j^ ri C{x)\\ H i/2 +P+i . 

Thus f(l + x 2 Y 2+s y 2 f Y {x) dx<oo from (17) yields 

IKi + ^j-H^-^-^^^r^H^^WKx)!!^^) 

<lli^TC c (^)ll ff i/ 2+ ^- (21) 

The last term in the decomposition (19) can be estimated similarly using the Cauchy-Schwarz 
inequality and Assumption 2(h) for ((f^ 1 )' 

\\F-\^ l )\-u)F[j^C(x)]{u)FK h {u)\\\ L 2 +sm 
<\\{^)\-u)F[-i ri C(x)}(u)\\ L , 



^ll^^-^^iyii^ii^-i/w^-i^^^^ii^ 
< 11^0)11^-1/^- (22) 
Combining (20), (21) and (22), we obtain 

sup \\F- l [if~ l {-.)FK h \*C,{x)\\ L 2+ S{¥) < ||C|| Z/3+ m/2-w3+<5, (23) 
he(o,i) 

which is hnite for S small enough satisfying (3 + 5 ^ 7 S and 1/2 + /3 + 5 ^ j c . Since T 
converges pointwise to one and | T~ x \^p~ x {— •) F * C( x )\ 2 is uniformly integrable by the 
bound of the 2 + 5 moments, the variance converges to 



J |j--V £ -V-)]*C(*)| 2 iP(d*). 



4.2 Tightness 

Motivated by the representation (15) of the stochastic error, we introduce the empirical pro- 



cess 



v n {t) - F-^^i-^FKh] *Ct(x)(P„-P)(cLc) ) t € R. (24) 

In order to show tightness of the empirical process, we first show some properties of the class 
of translations U := {( t \t G R} for ( G Z 7s ' 7c . 

Lemma 6. For £ G Z 7s ' 7c £/te following is satisfied: 

(i) The decomposition Q t = Ct + Ct satisfies the conditions in the definition of Z 7s ' 7c wraf/i 
a t . PVe have sup tg]R ||Ct||z7 S ,7c < oo. 
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(ii) For any r\ G (0, 7 S ) there is a r > such that \\Q — Cs\\z^-vnc-v ^ \t — s\ T holds for all 
s,i£E with \t — s\ ^ 1 

Proof. 

(i) Since ||Ct ||/n* = f (u) 2 ' ys \e ttu J 7 ( s {u)\ 2 du = \\( s \\ 2 H -y s , both claims hold for the singular 
part. Applying the pointwise multiplier property of Besov spaces (48) as well as the 
Besov embeddings (47) and (45), we obtain for some M > 7 C and a G C°°(1R) as in 
definition (6) 

\\(xy(Cf(x) - a t (x))\\ H , c < || t& ||o- 1|<* - ty{C t {x) - a t {x)) ||^e 



(x-t) 

jxy 

(x-t) 



- |lj£ ^lb-ll^) T (C c (x)-a(x))||^: 



which is finite for all t G R since (x) T (x — t) T G C M (R). For the second claim we 
estimate similarly 

supH^CfMllff^ <sup||^|| H 7 C + |li^Tllc^sup||Ct C -at||H7c 
ten ten ten 

< Hs^tI|h7c||o|| c m + ll^ilb^llC-all^c < oo. 
(ii) For the singular part note that 

HCt — Clliffo-'? 

<||<«r-^C'(u)l|L"IKu>- ,| (l-e i( *- )t, )||oo 

^ IK'"> _r? ll J Loo (IR \(_| t _ s |-l/2 ; | t _ s |-l/2 )) 

V ||(1 - e l(t_s)M )|| L «, (( _| t _ s |-i/2 > | t _ s |-i/2 )) 
<\t- s \n/2 v \t- s \V\ 



For £ c we have 



< 



1 /-c 



+ 



i(x-s+t)+l^s 



C(*) - i^rC(x) 



fffc-v 



The first term can be treated analogously to ( s . Using some integer M G N strictly 
larger than 7 C , the second term can be estimated by 



i(x— s+t)+ 



T C(x) 



1 AC 



ij7c->? 



< 



< 



< 



|t - s\ 

\t - s\ 
\t-s\, 



+t)+l ix+l^s( x ) 



i(a;-s+t)+l 



i/7c->7 

1 /-c 



ff~«c-V 



where we used again pointwise multiplier (48), embedding properties of Besov spaces 
(47) and (45) as well as (i). □ 
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4.2.1 Pregaussian limit process 

Let G be the stochastic process from Theorem 1. It induces the intrinsic covariance metric 
d(s,t) := E[(G S - G t ) 2 ] 1/2 - 

Theorem 7. There exists a version of G with uniformly d-continuous sample paths almost 
surely and with sup tG]R |Gf| < oo almost surely. 

The proof of the theorem shows in addition that R is totally bounded with respect to d. 
The boundedness of the sample paths follows from the totally bounded index set and the 
uniform continuity. Further we conclude that Q defined in (7) is P-pregaussian by van der 
Vaart and Wellner [30, p. 89]. Thus G is a tight Borel random variable in £°°(1R) and the law 
of G is uniquely defined through the covariance structure and the sample path properties in 
the theorem [30, Lem. 1.5.3]. 

Proof. To show that the class is pregaussian, it suffices to verify polynomial covering numbers. 
To that end we deduce that 

d(s,t) = (\\gt-9a\\h(P) ~ (Ct - Cs,fx} 2 ) 1/2 < \\9t- 9s\\l2(p) (25) 

decreases polynomial for \t — s\ — > 0, for max(s,i) — > oo and for min(s,t) — > oo. Using the 
same estimates which show the moment bound (23) but replacing T ' Kh = 1, we obtain 

II F[V7\-*)} * CII^(P) < HCII WV2+^ (26) 

and thus by choosing 5 and rj small enough Lemma 6 yields d(s,t) < \\Q — Cs\\zi 3 + s ' 1 / 2 +i 3 + s ^ 
\t — s\ T . We now turn to the estimation of the tails. We will only consider the case s,t ^ N 
since the case s,t ^ N can be treated in the same way. Without loss of generality, let s < t. 

For the smooth component of £ we have to show that || (Ct( x ) ~ Cs( x ))\\h^c with 
t,s^ iV decays polynomially in N. It suffices to prove || j^fiCt ~ a t){ x )\\ H ~t c anc ^ HiSTi( ai — 
a s )(x)|| H7c with a 6 C°°(]R) from definition (6) of Z 7s,7c both decay polynomially in N. Let 
M > 7 C and ip £ C M (R) with if>(x) = 1 for x G R\[-|, \\ and ip(x) = for x G [-\, {]. The 
pointwise multiplier property (48) yields 

\\drACt-a t )(x)\\ H , c 

= \\{Mx/N) + (1 - ^x/N)))^-^ - a)(x)\\ Hlc 

S HCTTTllcM||^/iV)(C c - a)(x)\\ H , c + ||i^?|| c -IIC c " «ll^c 

< ||(x)-^(^/iV)||cM|l^) r (C c -a)(x)||^ c +iV- 1 ||C c -a||^e 

< aH tA1 ) 

and for N large enough such that supp(a') C [— N/2, N/2] we obtain 

\\dn( a t- a s){x)\\ Hlc 

= \\^^-a s ){ X )\\ Hlc < \\^\\ H -J(a t -a s )(x)\\ cM 

< ||(zx+ir 3/4 L 7c ||V'(x/Ar)(^+i)- i /4|| cM <iv-v4. 

To bound the singular part it suffices to show that 
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decays polynomially in N. To this end, we split the integral domain into 

i— N/2 
J — oo 

/oo 
I F-\^\-.) FC]{x)\ 2 f Y (x + t) dx. (27) 
-N/2 

To estimate the first term, we use the following auxiliary calculations 
ixT'^i-^TCKx) 

= - J- 1 [(^ £ - 1 ) / (-.) T C\ (x) + T' 1 We 1 (-) -^C s [y)\] (x) 

and with an integer M G N strictly larger than 7 S and a function x G (7 M (1R) which is equal 
to one on supp(£ s ) and has compact support 

l|yC s (y)ll^ = \\yx(y)C s (y)\\H-rs < ||yx(y)b- 2 ||C s (?/)ll^ 

^ lbx(y)llc« < o°> 

where we used the pointwise multiplier property (48) of Besov spaces as well as the Besov 
embeddings (47) and (45). Thus ix J 7-1 ^^-.) J 7 ( s ](x) € L 2 (IR). Applying this and the 
boundedness of fy to the first term in (27) yields 



-N/2 

\F- 1 [^ 1 {-.)FC s ]{x)\ 2 f Y {x + t)dx 

oo 

N/2 



< / IT-^i-^FCKx^dx 

J —oo 

r-N/2 

^4N- 2 / IxT-^H-^TCKxtfdx^N- 2 . 

J —oo 

Using Holders's inequality and the boundedness of fy, we estimate the second term in (27) 
by 



WT-^i-^TCKx^Us ^J^\fy(x + t)\^/ s dx^ 



5/(2+6) 



5/(2+5) 



While the first factor is finite according to our bound (20), which also holds when T Kh is 
omitted, the second one is of order N~ s due to the finite (2 + <5)th moment of P. Therefore, 
the second term in (27) decays polynomially. □ 

4.2.2 Uniform central limit theorem 

We recall the definition of the empirical process v n in (24). 
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Theorem 8. Grant Assumptions 1 and 2. Let 

(i/„(ti), • • .,is n (tk)) — >• (G tl ,. . . ,G tfc ) 

/or a// ti, . . . ,tk E R and /or a// k EN. If either j s ^ /3 + 1/2 and /i^n 1 / 4 -> oo as n -)• oo 
/or some p > /3 — 7 S + 1/2 or if ^ s > [3 + 1/2, i/ien 

z/ n AG in£°°(R). 
Proof. We split the empirical process v n into three parts 




(Ti (x) + T 2 (a;) + T 3 (x) ) (P n - P) ( dx) , 



where T±, T 2 and T3 correspond to the three terms in decomposition (19) and are given by 
(28), (29) and (30) below. For the first term 

7i(x) =T- 1 [<p-\- U )TQ{ U )FK h { U )\{x) (28) 

we distinguish the two cases 7 S > /3 + 1/2 and 7 S ^ (3 + 1/2. In the first case we will show 
that T\ varies in a fixed Donsker class. In the second case the process indexed by T\ is 
critical, this is where smoothed empirical processes and the condition on the bandwidth are 
needed. Tightness of T\ in this case will be shown in Section 4.2.3. We will further show that 
the second term T 2 and the third term T3 are both varying in fixed Donsker classes for all 
7 S > /3. In particular the three processes indexed by Ti, T 2 and T3, respectively, are tight. 
Applying the equicontinuity characterization of tightness [30, Thm. 1.5.7] with the maximum 
of the semimetrics yields that v n is tight. Since we have assumed convergence of the finite 
dimensional distribution the convergence of v n in distribution follows [30, Thm. 1.5.4]. 

Here we consider only the first case 7^ > j3 + 1/2. We recall that Q is contained in 
/T 7 "(1R). By the Fourier multiplier property of the deconvolution operator in Lemma 5(i) 
and by sup fc>0iM | T (u) \ ^ H-K'Hl 1 < 00 the functions T\ are contained in a bounded set 
of i7 1 / 2+,? (]R) for some n > small enough. We apply [23, Prop. 1] with p = q = 2 and 
s = 1/2 + r] and conclude that T\ varies in a universal Donsker class. 

The second term is of the form 

T 2 (x) = (l + ix)^- 1 ^- 1 ^)^^^^)]^)^^^)]^). (29) 

By Assumption 2(ii) we have (p~ x (u) < (u) 13 . For some n > sufficiently small, the functions 
i^piCt {y)i t E are contained in a bounded set of H^ +V+1 ^ 2 (JR) by Lemma 6. We obtain 
that the functions T2(x)/(l+zx) are contained in a bounded subset of H l / 2+r > (H) . Corollary 5 
in [23] yields with p = q = 2, /3 = —1, s = 1/2 + r\ and 7 = r\ that T 2 is contained in a fixed 
P-Donsker class. 

Similarly, we treat the third term 

T 3 (x) =T-^- 1 )\-u)n^ 1 C t (y)]{u)TK h (u))(x). (30) 

By Assumption 2(ii) we have {^ V )' ^5 (u} 13 ^ 1 . As above we conclude that the functions T3 
are contained in a bounded set of fl^ 3 / 2 (R). By [23, Prop. 1] with p = q = 2 and s = 77 + 3/2 
the term T 3 varies in a universal Donsker class. □ 
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4.2.3 The critical term 



In this section, we treat the first term T\ in the case 7 S ^ j3 + 1/2. We define 

qt-^frjH-v) (31) 

For simplicity in point (e) below it will be convenient to work with functions of bounded 
support. Thus we fix £ > and define the truncated kernel 

By the Assumption (5) on the decay of K we have sup fc>0 \\Kh — K^\\bv < oo. We conclude 
T{Kh — K^)(u) < (1 + with a constant independent of /t > 0. By Assumption 2(h) 

we have |^ 1 (n)| < (1 + \u\)P . The functions Q{u), t £ E, are contained in a bounded set 
of £P S (]R). Consequently Ti with if/j — instead of is contained in a bounded set of 
H la ~P +1 (R). With the same argument as used for T3 we see that this term is contained in a 
universal Donsker class because 7 S — j3 + 1 > 1 by assumption. So it remains to consider T\ 
with the truncated kernel 

In order to show tightness of the process indexed by T\ with the truncated kernel 
we check the assumptions of Theorem 3 by Gine and Nickl [14] in the version of Nickl and 
ReiB [24, Thm. 12] for the class Q = {qt\t G R} and for p, n (dx) := K^(x)dx, where qt(x) 
was defined in (31). By Section 4.2.1 the class Q is P-pregaussian. From the proof also follows 
that Q is P-pregaussian since this is just the case ( c = 0. 

We write 

Q'r ■= { r - q\r,q e Q, \\r - qW^gp) «S r}. 

Let p > @ - 7 S + 1/2 ^ be such that /i£n 1/4 ->• 00. We fix some p' G (/3 - 7^ + 1/2, p A 1) 
and obtain h n log^)" 1 / 2 ^ 1 / 4 — > 00. We need to verify the following conditions. 

(a) We will show that the functions in Q n := {q t * ji n \t G R} are bounded by M n := Ch n 
for some constant C > 0. Since is only a translation of go it suffices to consider go- By 
the definition of Z 7s,7c in (6), by Lemma 5(i) and by the Besov embedding (47) 

g = F-^i-u) T( s (u)] G C <^ P '(R). 

By our assumptions on the kernel (5) it follows that K' is integrable and thus that K is 
of bounded variation. Next we apply continuous embeddings for Besov spaces (44) and 

(46), (49) as well as the estimate for ||^ n || nP ' in Gine and Nickl [14, p. 384], which also 

1,1 

applies to truncated kernels, and obtain 

Iko * <?||oc < ||go * KPj BL i < ||g * <?|| B v 2 < ll^ll^ < Wj. (32) 



(b) For r G Q' T holds \\r * ^ 0) || i2(P) s$ \\r * K { ° ] - r\\ L 2( P ) + r. Thus it suffices to show that 
|| g * — g||x,2(p) — >■ uniformly over g G Q. We estimate 

11% * < } - ftll^cP) < \\<p7\-*)TC(TkW - i)|| La . 

( / 97 1 (— •) J 7 ^ is an L 2 -function and J 7 is uniformly bounded and converges to one 
as /i — ► 0. By dominated convergence the integral converges to zero. 
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(c) The estimates in (a) can be used to see that the classes Q n have polynomial L 2 ( 
covering numbers, uniformly in all probability measures Q and uniformly in n. The func- 
tion qo*K^ is the convolution of two L 2 -functions and thus continuous. The estimate (32) 

and embedding (50) yield that qo * is of finite 2-variation. We argue as in Lemma 1 

by Gine and Nickl [15]. As function of bounded 2-variation qo * can be written as 
a composition g n o f n of a nondecreasing function f n and a function g n , which satisfies a 
Holder condition \g n (u) — g n (v)\ ^ |u — vj 1 / 2 , see, for example, [8, p. 1971]. More precisely, 
we can take f n (x) to be the 2-variation of qo * up to x and the envelopes of /„ to 

be multiples of M 2 = C 2 hn 2p ■ The set F n of all translates of the nondecreasing function 
f n has VC-index 2 and thus polynomial L 1 (Q)-covering numbers [7, Thm. 5.1.15]. Since 
each e 2 -covering of translates of /„ for L X (Q) induces an e-covering of translates of g n ° fn 
for L 2 (Q) we can estimate the covering numbers by 

N(Q n ,L 2 (Q),e) ^ N(F n , L X (Q), e 2 ) < (M„/e) 4 , 



with constants independent of n and Q. The conditions for inequality (22) by Gine and 
Nickl [14] are fulfilled, where the envelopes are M n = Chn p and H n {rj) = H(r]) = 
C\ log(?7) + Co with Co, C\ > 0. Consequently 



E* 



.7=1 



< max | — — — — , — — log(n) 



n 



1/4 



(Qn)'_! 







n 



/4 



as n — > oo. 



(d) We apply Lemma 1 of [14] to show that 



Un^iQn = Ul x ^ [ Qt(x- y)K^(y) dy 

n>l L J ^ 



t G R 



is in the L 2 (P)-closure of ||i^||^i -times the symmetric convex hull of the pregaussian class 
Q. The condition — y) G L 2 (P) is satisfied for all y G R since qt G L 2 (R) and 
fy is bounded. q t (x - .) G L 1 ^^) is fulfilled owing to K^,q t G L 2 (R). The third 
condition that y H> ||<7i(» — y)\\L 2 (p) is in L 1 (|/x ri |) holds likewise since fy is bounded and 

(e) The L 2 (P)-distance of two functions in Q n can be estimated by 



E 



1/2 



( % *4 0) (X)-g s *4 0) (X)) 2 
= J q t (. - u)K^(u) - q s (. - u)K^\u) du 

^ J \ K h\ u )\\\<lt(* ~ u) ~ q s (» ~ u)h 2 (F) du 
^ II^/^IIl 1 sup \\Qt{» -u)- q s (» - u)|| L 2 (P) 



L 2 (P) 



IIE-(0)|| II || 

= \\ K h UL 1 su P \W+u - Qs+u\\L 2 (P)- 
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As seen in the proof that Q is pregaussian, the covering numbers grow at most polynomi- 
ally. We take N large enough such that N ^ 2£. Then s,t > N implies s + u, t + u > N/2 
and s,t < — N implies s + u, t + u < —N/2. Since this is only a polynomial change in 
N, the growth of the covering numbers remains at most polynomial. This leads to the 
entropy bound H(Q n , L 2 (F), rj) < log(?? _1 ) for rj small enough and independent of n. We 
define A n (r/) := log(r/ _1 )?7 2 . The bound in the condition is of the order log(n)~ 1 / 2 n 1 / 4 . As 
seen before (a) this growth faster than M n = Chn P ■ 



5 Proof of the lower bound 



First we show asymptotic linearity of 

Lemma 9. Supposing Assumptions 1 and 2 and ( € Z 7s,7c with j s > j3 and j c > (l/2Va)+7 S; 
the estimator with h n = o{n~ l K 2a+2 ~ ts ' ) ) is asymptotically linear with influence function 
/ J : ~ 1 [(p~ 1 (—m)] * C(y)(d x — F)(dy) and thus is Gaussian regular. 

Proof. The analysis of the bias of i? in Section 4.1.1 yields 

d=$ + j T- 1 [ l p- 1 (-.)TK h ]*C(y)^'n-^)(dy) + op(n- 1 f 2 ) 
=# + J ^- 1 [^ 1 (-«)]*C(2/)(Fn-P)(dy) 

+ J J^Hv7H-»)(rKh-l)]*av)(K-V)(dv) + op(n- 1 ' 2 ). 



Since 



E 



< 4E 



J F- 1 [<p- 1 (-.)]*C(d6 x -dF) 

is finite and E[ f J" -1 ^^-*)] * C)(d^ - dP)] = by (23) it suffices to show 
J T- l [^~ l (-.)(TK h - 1)] * C(2/)(Pn-P)(dy) = 0p {n- 1 / 2 ). 



(33) 



For convenience we write iph '■= T 1 [(p £ 1 (—»)(TKf l — 1)] * C an d let r > 0. Since (Yj) are 
independent and identically distributed, we obtain 



P(V /2 / ^ h (y)(F n -F)(dy)\ 



> r ) < r n E 



^(y)(P n -P)(dy)|' 



= r n E 



(y) (^) (Pn - P) ( dy ) (P„ - P) ( dz) 



t 2 n 



j,k=i 




My)Mz)(S Yj -F)(dy)(S Yk -F)(dz) 



r^Ejl y ^(j/)(fy-P)(dy)| : 
4r- x y |^(y)| 2 P(dy). 
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By uniform integrability of ipZ with respect to P by (23) and pointwise convergence ^ — > 
as h we conclude J \iph(y)\ 2 IP( dy) — > and thus (33). From asymptotic linearity follows 
Gaussian regularity by Proposition 2.2.1 of [1]. □ 

Let us now briefly discuss the consequence of Assumption 4 in terms of Fourier multipliers. 
Standard calculus yields lO/^ 1 )^^)! < {u)P~ k for k = 0, . . . , (|_/3J V M) + 1. With the same 
arguments as in the proof of Lemma 5(i) we deduce that 

(1 + iuf +k v¥Xu) and (1 + iur l3+k ( lf - 1 Y k \u) (34) 

are Fourier multipliers on Bp q (R) for all s£E,p,?£ [1, oo] and k = 0, . . . |_/3J V M. 

5.1 Information bound for smooth ( 

In this subsection we prove Theorem 2. 

Step 1: To determine the solution of the maximization problem (10), we define h := Sg = (g* 
f £ )f Y 1 ^ 2 with score operator S such that the Fisher information (9) satisfies (lg,g) = \\h\\ 2 L2 - 
Therefore, we obtain g = S~ 1 h = J 7 ~ 1 [^ 1 ] * {\ffyh). Owing to the adjoint equation (12), 
(g,C) = / (^ 7 ^ 1 [ [ Pe 1 (~')} * ()VfY~h = (h, (S' _1 )*C) holds. Ignoring all restrictions on g, the 
supremum is thus attained at 



h* 



(s-rc = (F- L [vs L (-)]*cwfY 



(35) 



Let us define j3 := [j3 + 1/2J + 1 and r := T 1 [(1 + iu) 13 ip^ 1 (u)]. Because of Lemma 5(ii) we 
obtain r G L 1 (R) n L 2 (R) and J" -1 ^ 1 ^)] = r*(Id-D)^. Therefore, the condition f g = 0, 
Fubini's theorem and the fundamental theorem of calculus, provided {\fjyh)^ G L 1 (R), k = 
0, . . . j3, imply 



= /r(/^ + f(f)(-l) fc /(^) W )- 

For each k = 1, . . . , (3 the integrability of (y/Jyh)® ,1 = k — 1, k, yields then J (y/fyh)^ = 
lim^ 00 (VF/i) (fc " 1) (^) " {yffrh^i-x) = and thus 



(36) 



since J r = J r r(0) = 1. Hence, we should project the solution h* onto the L 2 -orthogonal space 
sp&n{^/JY~}- L ■. 

h** := h* - {h * , ^ ) xffr 



(37) 



Mb 



= (V 1 ^-)] *(-J (J r ~ 1 [f7\-*)] * C)/y) Vfr 
= (?- 1 [<p7 1 (-*)]*<- J Cfx)Vh, 
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where we used / (J 7 " 1 ^"^-.)] * C)/y = / (fx by (12). This leads to the candidate for the 
maximization of (10) given by 

g * = s-'h** = s-^s-^c - (c, fx)s- 1 ^h = r 1 c - (C, fx) fx 

= T- 1 [<p- 1 } * {(V 1 ^-)] *c)/y} - ( | C/x)/x 
and (12) yields (#*,0 = (1 9*, 9*) and the bound 

XC = (I^f) = / * C )^ " ( / C/ *) 2 - (38) 

Inequality (11) holds then by the local version of the Hajek-Le Cam convolution theorem [1, 
Thm. 2.3.1]. It remains to check the conditions in (8), (^h**)^ G L l (R) for k = 0, ... {} 
and that the three- fold application of the adjoint equality is allowed. The latter will follow 
from \ffy~h**,fy G H? + (R) for some 0+ > /?. 

Step 2: We prove now the integrability of \ffyh** = *£ — f (fx^fy and its 

derivatives up to order ft which makes the calculation (36) rigorous. 
For convenience we denote 



K :=.F-i[^-i(-.)]* C 



r * 



E(f)(-D*c<" 



Owing to Young's inequality together with r G L 1 (R) Pi L 2 (R) and G L 2 (R) for any 
k ^ 0, we obtain k G C S (R) n H S (R) for any s ^ 0. It suffices to show ff ] G L 1 (R) for 
fc = 0, . . . , /3. Note that by (34) 

||(Id + D) fe / £ ||Li < H^Kl-^V.lllsO, < ||^- 1 [(l-<«)*- /, ]llflJ, 1 

is finite for f3 > k since then - ^) fe ~ /? ] = 7/3-fe,i G (R) £ ^(R) by the proof of 

Lemma 5(ii). Recalling that /3 ^ Z, we conclude iteratively /J*^ G L X (R) for A; = 0, . . . , [f3\ . 
Therefore 

ii/f lu^ii/r^ii^n/i^iu^oo 

by Assumption 3 and similarly for derivatives of lower order. 
Moreover, we conclude for j3~ G (j3 + 1/2, (3) that 

fy G i?f~(R) C S^" 1/2 (R) C tf^ + (R) 

for some /3 + > /3 by the embeddings (46) and (46). Since also k/y G ^ + (R), using k G C S (R) 
for s > (3, we can apply the adjoint equality (12) in Step 1. 

Step 3: We will show now ||<7*//x||oo < 00 which justifies fx ± rg* ^ for some choice of 
r > small enough. 

By Step 1 g* = J 7 ~ 1 [i^~ 1 ] * (nfy) — (£, fx) fx- For the second term Assumption 3 implies 
||(C,/x)/x//x||oo ^ IICIIlHI/xHooII/xIIl 1 < oo- Hence, we only need to show J 7-1 ^ 1 ] * 
( K fy) ^5 fx- Using the Besov embedding (45), the Fourier multiplier property of (34) and 
the pointwise multiplier property of Besov spaces (48), we obtain for some /3 + G (j3, [j3\ + 1) 

WF^We 1 ] * ( K /y)IU < ||«/H| B f ^INIb- 9 11/HIb" < M\c°\\fY\\ c i3+- 



00,1 OO.l OO.l 
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for any s > (3. In Step 2 we have seen that k G C S (R). Moreover, 



L/3J 



WfY\\ C fi+ =J^II/y Hoc + sup 



fc=0 

L/3J 



( X _y)/3+-L/3J 



fc=0 ^ • 

L/3J 



| x _ y |/3+-L/3J 

^||/x||ooEll/i fe) H^ + II^IU+-L.jll/i L/3J) |lL 1 <00, 



(39) 



k=0 



using the Besov embedding f x G W?(R) C flf* L/3J+1 C C^-L/JJ . Hence, <f G L°°(R). Since 
/x is a continuous, strictly positive function, we conclude that the quotient g* / fx is bounded 
on every compact subset of R. Therefore, it suffices to estimate the tails. For \x\ large enough 
Assumption 3 implies, using again (34), 

\F- l [^- l ]*{Kfy){x) 



fx(x) 



<\x M {F-\Ve l ]*^f Y )){ X )\ 
M 



k=0 
M 



<Y,\\y M - k Kfy\\ B ^ 



k=0 

Note that the above calculation shows that (pj l is a Fourier multiplier on the weighted Besov 
space with weight function (x) M [cf. 10, Def. 4.2.1/2 and Thm. 5.4.2]. Each term in the above 
sum can be estimated by 

\\y M - h 4c4fY\\ cf3+ 

M-k 



\M-k-l 



1=0 



0] c .\\fr\\c*+> 



where with abuse of notation (3 + < [f3\ +1 is slightly larger in the last line and s > (3 + . By (39) 



we have f Y G C^ + (R). Now, 



IX 



M-k-l 



C G ^(R) is again a Schwartz function and thus it 



suffices to show T' 1 ^' 1 )^ {-u) Fx] G C S (R) for s > f3, X G y(R) and k = 0, . . . , M. For 
/c = this is already done in Step 2. We proceed analogously: for any integer s)0we have 

ii j-- 1 K^H-^m^w- 



)VCL 



J- 1 [(l + iu^-P+Q^fo^Wi-u)] * ((Id-D)^- fe )-"x 

^||(1 + ^)(-/3+fc)A0 (v9 -l ) (fc) ( _ n) ||^|| D -(I d _ D )(^- fc )V0 x || i2 

<||(u)^ +fc ) A °) + ^|| L2 ||D s (Id-D)^- fc ) v0 x|| L2 . 
Owing to P > f3 + 1/2, the first factor is finite since 

-1/2, foip^k, 



((-/3 + k) A 0) + f3 - k ^ 



P - 1/2 - k < -1/2, for/3<£; 



and the second factor is the L 2 -norm of a Schwartz function and thus finite, too. 
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5.2 Approximation lemma 

To prove convergence of the information bounds it suffices to show that 

(g*, C) -> ((^)~ 1 C, (^)" 1 C> - (C, fx) 2 and (40) 

(Sg*, Sgl) = {g* n , Cn) -+ ((^)" 1 C, (^)" 1 C) - <C, fx) 2 (41) 

where we used the equality (g*,Cn) = i^9n^9n) = (^9n^9n) i which holds naturally for the 
maximizer of the information bound X^ n . For (40) we note 

(9*n,Q = (I _1 Cn,C) - (tn,fx)((,fx) = ((ST'Cn, (ST\) - (Cn,fx)(CJx) 

where the Cauchy-Schwarz inequality yields 

KfxXn ~ 01 = K(S*) -1 (Cn " 0, V^)! < USTHCn - Oh* (42) 

and 

K^*)- 1 ^ - 0, (ST'Ol = \((s*T\tn - 0, (^)- 1 C)l 

as n — > oo. Analogously follows (41), where we use that the assumption of the lemma implies 
((5*)- 1 Cn, (S*) _1 Cn) -> ((<S*) _1 C, (S*) _1 C) as n ^ oo. The second part of the claim tf Cn -> t? c 
has already been shown in the estimate (42). 

5.3 Information bound for non-regular £ 

To prove the efficiency of ■dt for t G R in Theorem 4, it is suffices by Lemma 3 and (35) to 
show 

((STHCn - C), (ST\(n ~ C)) 1/2 = II HV^i--)] * (Cn - OWlHF) (43) 

as n — > oo. Using the moment bound (23) replacing T Kh by 1, we obtain 

II ^[f^i-')} * (Cn - C)I|L 2 (P) ^ HCn - (\\ Z P+S,l/2+P+S- 

By assumption we have z /3+S ' 1 ^ 2+)3+5 C Z 7s ' 7c for 5 small enough. Because the space of 
Schwartz- functions is dense in every Sobolev space -fP(IR), s ^ 0, J^(IR,) is also dense in Z^ 3,lc 
and thus the information bound (11) holds for all £ G Z 7s ' 7c . Finally, applying Theorem 25.48 
of [29] and Theorem 7 from above completes the proof of Theorem 4. 

A Appendix: Function spaces 

Let us define the L p -Sobolev space for p G (0, oo) and m G N 

rrt 

:= {/ G LP(R)| £ H/flU* < ~} 
fc=0 

In particular, (R) = L P (R). Due to the Hilbert space structure, the case p = 2 is crucial. 
It can be described equivalently with the notation (u) = (1 + u 2 ) 1 / 2 by, a ^ 0, 

ir*(R) := {/ G L 2 (R)| ||/||^ := | (u) 2a \ F f{u)\ 2 du < oo} 
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which we call Sobolev space, too. Obviously, W^R) = H m (1R). Also frequently used are the 
Holder spaces. Denoting the space of all bounded, continuous functions with values in R as 
C(R) we define, a ^ 0, 

M 



C°(B.) := {/ E C(R)|||/||c. := E ^ H~ + gP ^^^^^ < " 



where [a] denotes the largest integer smaller or equal to a. A unifying approach which 
contains all function spaces defined so far, is given by Besov spaces [28, Sect. 2.3.1] which we 
will discuss in the sequel. Let <5^(R) be the Schwartz space of all rapidly decreasing infinitely 
differentiable functions with values in C and J*"(R) its dual space, that is the space of all 
tempered distributions. Let < ip £ ^(R) with suppV 7 Q { x |l/2 ^ \x\ ^ 2} and ip(x) > 
if {x\l/2 < \x\ < 2}. Then define <pj(x) := VK 2_j z)(XX-oo ^(2" fc x))" 1 , j = 1,2,..., and 
ifio(x) := 1 — YlfLi ^Pj( x ) sucn that the sequence {(pj}JL Q is a smooth resolution of unity. In 
particular, J 7 " 1 ^- J 7 f] is an entire function for all / G ^'(R). For s£R and p,q G (0, oo] 
the Besov spaces are defined by 

B'„ := {/ € ^'(R)|||/||^ 9 := ( £ 2*«|| J^>, ^/] 111,) * < oo}. 

3=0 



We omit the dependence of ||»||b| to ifr since any function with the above properties defines 
an equivalent norm. Setting the Besov spaces in relation to the more elementary function 
spaces, we first note that the Schwartz functions J^(R) are dense in every Besov space Bp 
with p,q < oo and H a (R) = B^ 2 (R) as well as C a (R) = 5£, i0O (R), where the latter holds 
only if a is not an integer [28, Thms. 2.3.3 and 2.5.7]. Frequently used are the following 
continuous embeddings which can be found in [28, Sect. 2.5.7, Thms. 2.3.2(1), 2.7.1]: For 
p)l,me2 

^(EjC^fEjC^fE) and ^(R) C L°°(R) C i& )0O (R) (44) 
and for s > 

B^B) C C S (R) C ^ i0O (R). (45) 
Furthermore, for < po ^ Pi ^ 00 1 Q ^ and — oo < si ^ so < oo 

%(1R)C%(]R) if *o-^si-^ (46) 

and for < p, qo, q\ ^ oo and — oo < si < s < oo 

%(U)C%(E). (47) 

Another important relation is the pointwise multiplier property of Besov spaces [28, (24) on 
p. 143] that is 

\\fg\\B kq < II/IIbsJIsIIb.,, (48) 

for s > 0, 1 ^ p ^ oo and < q ^ oo. 
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The Besov norm of a convolution can be bounded by Lemma 7 (i) in [25]. Let 1 ^ p, q, r, s ^ 
oo, -oo < a,/3 < oo, 1/u = l/p + 1/r - 1 < 1, < 1/v = 1/g + l/a ^ 1. For / G 5£,(R) 
and 5 G £^ S (R) 

ll/*9ll B ^<ll/II^Jl5ll^ • (49) 

Using for any function / : R — >■ R and /i G R the difference operators Aj l f(x) : = 
f(x+h) — f(x) and (A^/)(x) := Aj l (A| i ~ 1 /)(x), I G N, the Besov can be equivalently described 
by 

+ 11/11^ with ||/||^ :=(| l^-^IIAf/ll^d^) 179 

for s > 0,p,<? ^ 1 and any integer M > s [28, Thm. 2.5.12]. The space of all / G ^'(R) 
for which ||/||d s is finite is called homogeneous Besov space i?*(R) [28, Dcf. 5.1.3/2, Thm. 

pq ' ^ 

2.2.3/2] and thus B s pq = L P (R) n B s pq (n) for s > 0,p,q ^ 1. Of interest is the relation of 
homogeneous Besov spaces to functions of bounded p-variation. Let BV P (R) denote the space 
of measurable functions / : R — > R such that there is a function g which coincides with / 
almost everywhere and satisfies 

n 

sup | \d( x i) ~ 9( x i-i)\ P — oo<xi<---<x ra <oo,nGN|<oo 
i=i 

and we define BV P (R) as the quotient set BV P (JR) modulo equality almost everywhere. Then, 

B# P (R) C BV P (B) C BVP (R), for p > 1 (50) 
by [4, Thm. 5]. For p = 1 holds by [14, Lem. 8] 

BVi(R) n L*(R) C Si )00 (R). (51) 
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